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In  this  work  we  have  pursued  the  development  of  parallel  algorithms  for  matrix  compu¬ 
tations.  We  highlight  a  few  of  these  activities  below,  and  include  a  complete  list  of  the 
publications  resulting  from  this  research  grant. 

Bounds  on  Scaled  Projectors.  Scaled  projection  operators  arise  in  computations 
for  weighted  least  squares  problems,  linear  programming  algorithms,  and  other  appli¬ 
cations.  Let  A'  be  a  matrix  of  full  column  rank  and  let  B  be  a  diagonal  matrix  with 
positive  diagonal  elements.  We  have  shown  that  the  weighted  pseudo-inverse  defined 

by  =  {X^ DX)~^X^ D  and  the  associated  oblique  projection  Pd  =  A'AT^  r,  have 
norms  bounded  by  numbers  that  are  independent  of  D. 

Iteratively  Reweighted  Least  Squares.  We  studied  various  algorithms  to  obtain 
estimates  of  solution  vectors  and  residual  vectors  for  the  linear  model  Ax  =  b  +  e  =  btrue 
using  an  iteratively  reweighted  least  squares  criterion,  which  tends  to  diminish  the  in¬ 
fluence  of  outliers  compared  with  the  standard  least  squares  criterion.  Algorithms  ap¬ 
propriate  for  dense  and  sparse  matrices  were  developed.  Solving  Newton’s  linear  system 
using  updated  matrix  factorizations  or  the  (unpreconditioned)  conjugate  gradient  iter¬ 
ation  gave  the  most  effective  algorithms.  Four  weighting  functions  were  compared,  and 
results  were  obtained  for  sparse  well-conditioned  and  ill-conditioned  problems. 

Subspace  updating.  An  important  problem  in  array  signal  processing  is  to  de¬ 
termine  the  null  space  of  a  matrix  of  signals  sampled  at  discrete  times  by  an  array  of 
m  sensors.  It  is  necessary  that  the  subspace  be  updated  in  real  time.  The  customary 
approach  has  been  through  the  singular  value  decomposition;  however,  this  decom¬ 
position  cannot  be  updated  exactly  in  fewer  than  O(m^)  operations,  and  all  parallel 
algorithms  proposed  for  approximate  updating  require  0{vn})  processors.  Recently  we 
have  introduced  a  new  decomposition  —  the  URV  decomposition  —  which  can  be  up¬ 
dated  sequentially  in  O(m^)  operations,  and  in  parallel  in  0(m)  operations  using  0{m) 
processors.  This  decomposition  has  applications  in  other  areas  of  signal  processing.  We 
are  investigating  a  sequential  implementation  of  the  method.  A  parallel  implementation 
has  been  debugged  on  a  simulator,  we  are  preparing  to  move  it  to  the  IWARP. 

Parallel  QR  factorization.  A  project  on  parallel  QR  factorization  hcis  been  com¬ 
pleted.  A  parallel  Gram-Schmidt  algorithm  and  a  parallel  Householder  algorithm  have 


been  developed  and  programmed,  and  analytical  models  for  the  time  complexity  of 
these  algorithms  have  been  developed.  The  models  were  validated  over  a  wide  range  of 
parameter  values  for  floating  point  and  communication  speed  through  experiments  on 
the  ZMOB,  MCMOB,  a  16  processor  Butterfly  with  hardware  floating  point,  and  a  128 
processor  Butterfly  with  software  floating  point. 

Interprocessor  communication.  It  has  long  been  known  that  there  is  a  close 
relationship  between  granularity  of  communication  and  granularity  of  computation. 
Roughly  speaking  the  coarser  the  granularity  of  communication,  the  coarser  the  gran¬ 
ularity  of  computation  must  be  to  compensate.  In  a  paper  to  appear  in  Parallel  Com¬ 
puting,  we  investigate  this  phenomenon  theoretically  and  empirically.  The  conclusion  is 
that  to  solve  the  large  problems  that  tomorrow’s  generation  of  parallel  computers  can 
hold,  we  must  have  fine  grained  communication. 

Polynomial  preconditioners  for  conjugate  gradient  algorithms.  Precondi¬ 
tioning  to  produce  a  more  favorable  distribution  of  eigenvalues  is  essential  in  using  the 
conjugate  gradient  algorithm  to  solve  linear  systems  of  equations.  The  choice  of  pre¬ 
conditioner  must  be  well  matched  to  the  problem  and  to  the  computer  architecture. 
Polynomial  preconditioning  is  a  useful  tool  in  the  effective  use  of  the  conjugate  gra¬ 
dient  algorithm  on  special  architectures  such  as  message  passing  parallel  computers, 
machines  with  hierarchical  memory,  vector  processors,  and  machines  with  very  limited 
memory.  We  have  developed  a  new  adaptive  algorithm  that  uses  a  polynomial  based  on 
the  residual  polynomial  from  k  steps  of  the  conjugate  gradient  algorithm.  This  precon¬ 
ditioning  requires  no  prior  information  about  the  matrix  and  is  efficient  on  a  \’ariety  of 
architectures. 


Eigenvalues  of  Arrowhead  Matrices.  A  query  from  a  physicist  led  us  to  consider 
the  problem  of  finding  the  eigenvalues  and  eigenvectors  of  a  symmetric  matrix  with 
nonzeroes  only  in  the  main  diagonal  and  the  last  row  and  column.  A  highly  parallel 
algorithm  was  developed  and  an  error  analysis  wais  completed. 

Projection  methods  for  eigenvalue  problems.  Projection  methods,  sucli  as 
Kaezmarz’s  algorithm  are  promising  for  very  large  eigenvalue  problems,  since  they  use 
comparatively  little  storage  and  access  the  matrix  only  one  row  at  a  time.  Kaezmarz’s 
method  for  inhomogeneous  systems  has  been  extensively  analyzed.  In  spite  of  this, 
very  little  is  known  about  the  rate  of  convergence  of  the  method.  Using  a  relation 
between  Kaezmarz’s  method  and  SOR,  we  conjecture  that  the  convergence  rate  should 
be  approximately 
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where  is  the  second  smallest  singular  value  oi  A  —  XI  and  X  is  the  eigem’alue  whose 
eigenvector  is  to  be  found. 
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Markov  chains.  Although  the  theory  of  Gaussian  elimination  for  the  solution 
of  Markov  chains  insures  that  it  will  perform  well  on  chains  with  balanced  transition 
probabilities,  the  method  fails  for  nearly  uncoupled  Markov  chains.  We  have  given  a 
formal  rounding-error  u,nalysis  of  a  variant  of  Gaussian  elimination  which  works  for 
these  chains. 

Two  students  completed  Ph.D.  theses  in  the  study  of  Markov  chains.  Xiaobai  Sun 
presented  a  unified  framework  for  studying  various  aggregation  algorithms  and  used  it  to 
study  the  convergence  rate  of  these  algorithms.  Pil  Park  studied  iterative  algorithms  for 
overflow  queuing  networks,  and  achieved  good  results  with  a  combination  of  projection 
and  preconditioning. 

Perturbation  theory.  Six  problems  in  matrix  perturbation  theory  have  emerged 
from  our  work.  They  concern  the  condition  of  nearly  uncoupled  Markov  chains,  the 
computation  of  residual  bounds  for  eigenvalues,  the  computation  of  residual  bounds 
for  singular  values,  the  condition  of  multiple  eigenvalues,  the  perturbations  of  nearly 
transient  Markov  drains,  and  the  perturbation  of  matri.x  factorizations.  Satisfactory 
solutions  of  these  problems  have  been  found. 

Constrained  matrix  Sylvester  equations.  The  matrix  Sylvester  problem  of 
finding  a  matrix  T  to  satisfy  AT  ATF  —  C  arises  in  applications  in  control  problems 
involving  the  solution  of  ordinary  differential  equations.  In  the  design  of  reduced  order 
observers  for  loop  transfer  recovery,  the  solution  T  is  further  constrained  to  satisfy 
TB  =  0,  but  the  matrix  C  is  the  product  of  two  matrices,  on;  to  be  determined. 
Questions  of  existence  and  uniqueness  of  solutions  to  such  problems  have  been  studied, 
and  a  computational  algorithm  has  been  developed  and  tested,  and  applied  to  design 
of  reduced  order  observers  that  achieve  loop  transfer  recovery  in  aircraft. 

Publication,  etc. 

Technical  reports 

1.  G.  W.  Stewart,  “An  Iterative  Method  for  Solving  Linear  Inequalities,”  CS-TR- 
1833,  University  of  Maryland,  April,  1987 

This  paper  describes  a  method  for  solving  homogeneous  linear  inequalities.  The 
numerical  techniques  required  by  the  algorithm  can  be  parallelized  and  are  im¬ 
portant  in  a  number  of  other  applications. 

2.  Gene  H.  Golub  and  Dianne  P.  O’Leary,  “Some  History  of  the  Conjugate  Gradient 
and  Lanczos  Algorithms:  1948-1976,”  CS-TR-1859  and  UMIACS-TR-87-20,  Uni¬ 
versity  of  Maryland,  June,  1987. 

This  manuscript  gives  some  of  the  history  of  the  conjugate  gradient  and  Lanczos 
algorithms  and  an  annotated  bibliography  for  the  period  1948-1976. 
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3.  Robert  A.  van  de  Geijn,  “Implementing  the  QIl-Algorithm  on  an  Array  of  Pro¬ 
cessors,”  CS-TR-1897,  University  of  Maryland,  August,  1987. 

QR-aJgorithms  for  solving  the  algebraic  eigenvalue  problem  that  initially  reduce 
the  matrix  to  upper  Ilessenberg  form  and  utilize  traditional  shifting  strategies  do 
not  lend  themselves  to  efficient  implementation  on  a  grid  of  proceSi.ors.  In  this 
thesis,  we  introduce  a  variation  of  the  QR-algorithm  that  works  with  the  full  ma¬ 
trix  and  show  how  it  can  be  implemented  on  a  square  array  of  processors.  By 
using  a  deferred  shifting  scheme,  iterations  can  be  pipelined,  thereby  reducing 
processor  idle  time.  A  thorough  analysis  of  deferred  shifting  techniques  show  that 
the  asymptotic  convergence  rate  remains  acceptable. 

4.  Zhaojun  Bai,  “The  Direct  GSVD  Algorithm  and  Its  Parallel  Implementation,” 
CS-TR-1901,  University  of  Maryland,  August,  1987. 

The  generalized  singular  value  decomposition  (GSVD)  is  the  simultaneous  reduc¬ 
tion  of  any  two  matrices  having  the  same  number  of  columns  to  diagonal  matrices 
by  premultiplying  by  two  different  orthogonal  matrices  and  postmultiplying  by 
the  same  nonsingular  matrix.  Following  the  work  of  C.  C.  Paige  on  the  sequential 
Jacobi-like  GSVD  algorithm,  we  first  provide  a  clearer  description  of  his  algorithm 
and  a  more  straightforward  proof.  A  new  version  of  the  direct  GSVD  algorithm, 
called  the  direct  GSVD  algorithm,  is  given.  The  error  analysis  shows  that  the 
algorithm  is  stable  in  the  presence  of  rounding  errors. 

A  parallel  implementation  of  the  direct  GSVD  algorithm  is  proposed  in  the  second 
part  of  the  paper.  The  parallel  algorithm  falls  into  two  parts.  The  first  is  that  the 
input  matrices  are  preprocessed  in  parallel  by  computing  their  upper  trapezoidal 
forms,  for  which  we  develop  a  parallel  QR  decomposition  with  column  pivoting. 
The  second  is  the  parallel  computations  of  the  GSVD  of  two  upper  trapezoidal 
matrices. 

Finally,  the  direct  GSVD  algorithm  is  used  to  derive  an  efficient  method  for  the 
problem  of  equality-constrained  least  squares,  and  show  how  effective  the  GSVD  is 
in  dealing  with  a  linearly  constrained  Gauss-Markov  model.  The  GSVD  provides 
an  efficient  algorithm  and  reveals  the  structure  of  the  model  more  clearly  than  the 
usual  general  inverse  expressions. 

5.  Zhaojun  Bai  “Data-flow  Algorithm  for  Computing  the  Singular  Value  Decompo¬ 
sition,”  CS-TR-1941  University  of  Maryland,  August,  1987. 

In  this  report,  a  data-flow  algorithm  for  computing  the  singular  value  decompo¬ 
sition  (SVD)  of  any  matrix  is  developed.  It  contains  the  description  of  a  prepro¬ 
cession  QRD  algorithm  for  QR  decomposition  and  SVD  algorithm  for  the  SVD  of 
a  triangular  matrix.  Asymptotic  convergence  rate  in  parallel  odd-even  ordering 
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is  examined.  Moreover,  the  computational  network,  assignment  of  the  computa- 
tiontd  nodes,  and  data  communication  complexity  are  also  presented. 


6.  Zhaojun  Bai,  “Numerical  Treatment  of  Restricted  Gauss-Markov  Model,”  CS-TR- 
1942  University  of  Maryland,  August,  1987. 

The  singular  value  decomposition  has  been  widely  used  in  the  ordinary  linear 
model  and  other  statistical  problems.  In  this  paper,  we  shall  introduce  the  gen¬ 
eralized  singular  value  decomposition  of  any  two  matrices  A'  and  11  having  the 
same  number  of  columns  to  treat  large  scale  restricted  Gauss-Markov  models 
{y,Xbeta  =  r,  sigma? I).  This  method  leads  to  an  efficient,  numerically  stable 
and  easily  programmed  algorithm  for  the  best  linear  unbiased  estimator  in  large 
scale  restricted  Gauss-Markov  linear  models.  The  added  information  and  sensi¬ 
tivity  that  it  provides  may  be  very  useful  in  understanding  the  problem. 

7.  G.  W.  Stewart,  “The  Crab:  a  Dialogue,”  University  of  Maryland  CS-TR-2025, 
May  1988. 

The  CRAB  is  a  module  for  building  memory  connected  parallel  systems  in  which 
processors  that  are  connected  to  one  another  can  read  and  write  eaclr  other’s 
memory.  This  arrangement  solves  some  of  the  communication  problems  associated 
with  more  conventional  message  passing  systems.  In  particular,  it  is  possible  for 
a  line  of  processors  to  pipeline  data,  altering  the  data  items  as  they  pass  by.  This 
report  contains  a  dialogue  about  the  crab,  describing  what  it  is  and  what  it  can 
do. 

8.  G.  W.  Stewart,  “On  Scaled  Projections  and  Pseudo-Inverses,”  CS-TR  2026,  May, 
1988. 

Let  AT  be  a  matrix  of  full  column  rank  and  let  Z?  be  a  diagonal  matrix  with  positive 
diagonal  elements.  The  weighted  pseudo-inverse  defined  by  =  (X^DX)~^X^D 

and  the  associated  oblique  projection  Pq  =  A'A'^  arise  in  many  applications.  In 
this  paper,  we  show  that  the  norms  of  both  matrices  are  bounded  by  numbers 
that  are  independent  of  D. 

9.  Dianne  P,  O’Leary  and  Peter  Whitman,  “Parallel  QR  Factorization  by  House¬ 
holder  and  Modified  Gram-Sclimidt  Algorithms, “  CS-TR  2119,  UMIACS-TR-SS- 
78,  October,  1988. 

In  this  paper,  the  parallel  implementation  of  two  algorithms  for  forming  the  QR 
factorization  of  a  matrix  is  studied.  We  propose  parallel  algorithms  for  the  mod¬ 
ified  Gram-Schmidt  and  the  Householder  algorithms  on  message  passing  systems 
in  which  the  matrix  is  distributed  by  blocks  of  rows.  The  models  that  predict 
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performance  of  tlie  algoritlims  are  validated  by  experimental  results  on  several 
parallel  machines. 


10.  G.  W.  Stewart,  “Stochastic  Perturbation  Theory,”  CS-TR  2129,  UMIACS-TR-88- 
76,  October,  1988. 

In  this  paper,  classical  matrix  perturbation  theory  is  approached  from  a  proba¬ 
bilistic  point  of  view.  The  perturbed  quantity  is  approximated  by  a  first  order 
perturbation  expansion,  in  which  the  perturbation  is  assumed  to  be  random.  This 
permits  the  computation  of  statistics  estimating  the  variation  in  the  perturbed 
quantity.  Up  to  the  higher  order  terms  that  are  ignored  in  the  expansion,  these 
statistics  tend  to  be  more  realistic  than  perturbation  bounds  obtained  in  terms  of 
norms.  The  technique  is  applied  to  a  number  of  problems  in  matrix  perturbation 
theory,  including  least  squares  and  the  eigenvalue  problem. 

11.  G.  W.  Stewart,  “Communication  and  Matrix  Computations  on  Large  Message 
Passing  Systems,”  CS-TR-2135,  University  of  Maryland,  October,  1988. 

This  paper  is  concerned  with  the  consequences  for  matrix  computations  of  having 
a  rather  large  number  of  general  purpose  processors,  say  ten  or  twenty  thousand, 
connected  in  a  network  in  such  a  way  that  a  processor  can  communicate  only 
with  its  immediate  neighbors.  Certain  communication  taisks  associated  with  most 
matrix  algorithms  are  defined  and  formulas  developed  for  the  time  required  to 
perform  them  under  several  communication  regimes.  The  results  are  compared 
with  the  times  for  a  nominal  floating  point  operations.  The  results  suggest 
that  it  is  possible  to  use  a  large  number  of  processors  to  solve  matrix  problems  at 
a  relatively  fine  granularity. 

12.  D.  P.  O’Leary  and  G.  W.  Stewart,  “Computing  the  Eigenvalues  and  Eigenvectors 
of  Arrowhead  Matrices,”  CS-TR-2203,  University  of  Maryland,  February  1989. 
This  paper  treats  the  eigenvalue  problem  for  a  symmetric  matrix  which  is  zero 
except  for  its  main  diagonal  and  one  row  and  column.  Such  problems  arise  in 
the  description  of  radiationless  transitions  in  isolated  molecules  and  of  oscillators 
vibrationally  coupled  with  a  Fermi  liquid.  In  these  applications  the  order  n  of 
the  matrix  A  can  be  in  the  thousands.  The  purpose  of  this  paper  is  to  present 
formulas  and  efficient  and  highly  parallel  algorithms  for  computing  eigen\’alues 
and  eigenvectors  of  such  matrices. 

13.  Dianne  P.  O’Leary,  “On  Bounds  for  Scaled  Projections  and  Pseudo-Inverses,” 
UMIACS-TR-89-32  CS-TR-2215,  University  of  Maryland  February,  1989. 

Let  X  be  a  matrix  of  full  column  rank  and  let  D  be  a  positive  definite  diagonal 
matrix.  In  a  recent  paper,  Stewart  considered  the  weighted  pseudo-inverse  Xj-,  = 
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(X^DX)  ^ X^ D  and  the  associated  oblique  projection  Pq  =  A'A't,,  and  gave 
bounds,  independent  of  D,  for  the  norms  of  these  matrices.  In  this  note,  we 
answer  a  question  lie  raised  by  showing  that  the  bounds  are  computable. 

14.  Dianne  P.  O’Leary,  “On  Iteratively  Reweighted  Linear  Least  Squares  Problems,” 
CS-TR-2278,  UMIACS  89-66,  University  of  Maryland,  July,  1989. 

Several  variants  of  Newton’s  method  are  used  to  obtain  estimates  of  solution  vec¬ 
tors  and  residual  vectors  for  the  linear  model  Ax  =  h-\-e  =  using  an  iteratively 
reweighted  lease  squares  criterion,  which  tends  to  diminish  the  influence  of  outliers 
compared  with  the  standard  least  squares  criterion.  Algorithms  appropriate  for 
dense  and  sparse  matrices  are  presented.  Solving  Newton’s  linear  system  using 
updated  matrix  factorizations  or  the  (unpreconditioned)  conjugate  gradient  iter¬ 
ation  gives  the  most  effective  algorithms.  Four  weighting  functions  are  compared, 
and  results  are  given  for  sparse  well-conditioned  and  ill-conditioned  problems. 

15.  G.  W.  Stewart,  “Perturbation  Theory  and  Least  Squares  with  Errors  in  the  Vari¬ 
ables,”  UMIACS-TR-89-97,  CS-TR  2326,  University  of  Maryland,  October,  1989. 

In  this  note  we  examine  what  matrix  perturbation  theory  has  to  say  about  or¬ 
dinary  least  squares  estimation  when  the  regression  matrix  is  contaminated  by 
random  errors.  The  conclusion  is  that  there  is  a  regime  in  which  the  errors  can 
have  important,  even  overwhelming  effects  yet  do  not  affect  the  validity  of  or¬ 
dinary  least  squares  procedures.  The  boundary  of  this  regime  is  indicated  by  a 
diagnostic  number  which  can  be  calculated  from  the  data. 

16.  G.  VV.  Stewart,  “Two  Simple  Residual  Bounds  for  Eigenvalues  of  Hermitian  Ma¬ 
trices,”  UMIACS-TR-89-123,  CS-TR  2364,  December,  1989. 

Let  A  be  Hermitian  and  let  the  orthonormal  columns  of  X  span  an  approximate 
invariant  subspace  of  A''.  Then  the  residual  R  =  AX  —  XM  {M  —  A'^,Y)  will  be 
small.  The  theorems  of  this  paper  bound  the  distance  of  the  spectrum  of  M  from 
the  spectrum  of  A  in  terms  of  appropriate  norms  of  R. 

17.  G.  W.  Stewart,  “On  the  Sensitivity  of  Nearly  Uncoupled  Markov  ChaJiis,”  UMIACS- 
TR-90-18,  CS-TR  2406,  February,  1990. 

Nearly  uncoupled  Markov  chains  (aka  nearly  completely  decomposable  Markov 
chains)  arise  in  a  variety  of  applications,  where  they  model  loosely  coupled  sys¬ 
tems.  In  such  systems  it  may  be  difficult  to  determine  the  transitions  probabilities 
with  high  accuracy.  This  paper  investigates  the  sensitivity  of  the  limiting  distribu¬ 
tion  of  the  chain  to  perturbations  in  the  transition  probabilities.  The  conclusion 
is  that  nearly  uncoupled  Markov  chains  are  quite  sensitive  to  such  perturbations 
but  the  perturbation  of  the  limiting  distribution  is  not  arbitrary. 
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18.  G.  W.  Stewart  and  G.  Zhang,  “Eigenvalues  of  Graded  Matrices  and  the  Condition 
Numbers  of  a  Multiple  Eigenvalue,”  UMIACS  TR  90-31,  CS  TR  2420,  February 
1990. 

This  paper  concerns  two  closely  related  topics:  the  behavior  of  the  eigenvalues 
of  graded  matrices  and  the  perturbation  of  a  nondefective  multiple  eigenvalue. 
We  will  show  that  the  eigenvalues  of  a  graded  matrix  tend  to  share  the  graded 
structure  of  the  matrix  and  give  precise  conditions  insuring  that  this  tendency  is 
realized.  These  results  are  then  applied  to  show  that  the  secants  of  the  canonical 
angles  between  the  left  and  right  invariant  subspaces  of  a  multiple  eigenvalue  tend 
to  characterize  its  behavior  when  its  matrix  is  slightly  perturbed. 

19.  Dianne  P.  O’Leary,  “Yet  Another  Polynomial  Preconditioner  for  the  Conjugate 
Gradient  Algorithm,”  CS-TR-2425,  UMIACS  90-36,  University  of  Maryland,  March, 
1990. 

Polynomial  preconditioning  is  a  useful  tool  in  the  effective  u.'e  of  the  conjugate 
gradient  algorithm  on  special  arcliitectures  such  as  mess"  ^sing  parallel  com¬ 
puters,  machines  with  hierarchical  memory,  vector  pr  i  rs,  and  machines  with 
very  limited  memory.  In  this  work  we  investigate  the  use  of  a  new  adaptive  algo¬ 
rithm  wliich  uses  the  polynomial  preconditioner  bcised  on  the  residual  polynomial 
from  k  steps  of  the  conjugate  gradient  algorithm. 

20.  G.  W.  Stewart,  “An  Updating  Algorithm  for  Subspace  Tracking,”  UMIACS-TR- 
90-86,  CS-TR  2494,  July  1990. 

In  certain  signal  processing  applications  it  is  required  to  compute  the  null  space 
of  a  matrix  whose  rows  are  samples  of  a  signal.  The  usual  tool  for  doing  this  is 
the  singular  value  decomposition.  However,  the  singular  \'alue  decomposition  has 
the  drawback  that  it  requires  0{p^)  operations  to  recompute  when  a  new  sample 
arrives.  In  this  paper,  we  show  that  a  different  decomposition,  called  the  URV, 
decomposition  is  equally  effective  in  exhibiting  the  null  space  and  can  be  updated 
in  O(p^)  time.  The  updating  technique  can  be  run  on  a  linear  array  of  p  processors 
in  0{p)  time. 

21.  G.  W.  Stewart  and  G.  Zhang,  “On  a  Direct  Method  for  the  Solution  of  Nearly- 
Uncoupled  Markov  Chains,”  UMIACS  TR  90-95,  CS  TR  2504,  July  1990. 

This  note  is  concerned  with  the  accuracy  of  the  solution  of  nearly  uncoupled 
Markov  chains  by  a  direct  method  based  on  the  LU  decomposition.  It  is  shown 
that  plain  Gaussian  elimination  may  fail  in  the  presence  of  rounding  errors.  A 
modification  of  Gaussian  elimination  with  diagonal  pivoting  as  well  as  corrections 
of  small  pivots  by  sums  of  off-diagonal  elements  in  the  pivoting  columns  is  proposed 
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and  analyzed.  It  is  shown  that  the  accuracy  of  the  solution  is  affected  by  two 
condition  numbers  associate  with  the  aggregate  and  the  coupling  respectively. 

22.  G.  W.  Stewart,  “Perturbation  Theory  for  the  Singular  Value  Decomposition,” 
UMIACS-TR-90-124,  CS-TR  2539,  September,  1990. 

The  singular  value  decomposition  has  a  number  of  applications  in  digital  signal 
processing.  However,  the  the  decomposition  must  be  computed  from  a  matri.x 
consisting  of  both  signal  and  noise.  It  is  therefore  important  to  be  able  to  assess 
the  effects  of  the  noise  on  the  singular  values  and  singular  vectors  —  a  problem  in 
classical  perturbation  theory.  In  this  paper  we  survey  the  perturbation  theory  of 
the  singular  value  decomposition. 

23.  Jewel  B.  Barlow,  Moghen  M.  Monahemi,  and  Dianne  P.  O'Leary,  “Constrained 
matrix  Liapunov  equations,”  UMIACS-91-16,  CS-2599,  January  1991. 

We  consider  the  problem  of  finding  matrices  Z,  and  T  satisfying  TA—i^T  =  TC  and 
TD  =  Q.  We  establish  existence  conditions  for  the  solution,  derive  an  algorithm  for 
computing  the  solution,  and  discuss  conditions  under  which  the  matrix  [C^,T^] 
is  full  rank.  The  problem  arises  in  control  theory,  in  the  design  of  reduced  order 
observers  that  achieve  loop  transfer  recovery. 

24.  Jewel  B.  Barlow,  Moghen  M.  Monahemi,  and  Dianne  P.  O’Leary,  “The  design  of 
reduced-order  Luenberger  observers  with  precise  LTR,”  UMIACS-91-17,  CS-2600, 
January  1991. 

This  work  concerns  the  design  of  reduced-order  observers  for  controllable,  observ¬ 
able,  and  regular  systems  in  which  the  number  of  measurements  is  more  than  the 
number  of  controls.  It  uses  eigenstructure  assignment  whereas  other  approaches 
use  Kalman  filter  (LQG/LTR)  methods.  The  advantages  of  this  approach  are 
precise  rather  than  approximate  LTR,  no  restriction  to  minimum  phase  systems, 
finite  rather  than  infinite  observer  gain,  and  simpler  and  more  efficient  numerical 
calculation.  Case  studies  are  presented  illustrating  these  features. 

25.  G.  W.  Stewart  ”On  an  Algorithm  for  Refining  a  Rank-Revealing  URV  Factoriza¬ 
tion  and  a  Perturbation  Theorem  for  Singular  Values”  UMIACS-TR-01-38,  CS-TR 
2626,  March  1991 

In  this  note  we  consider  an  iterative  algorithm  for  moving  a  triangular  matrix 
toward  diagonality.  The  method  is  shown  to  converge  under  conditions  that  are 
likely  to  be  met  in  practice.  A  result  of  the  convergence  proof  in  a  new  perturba¬ 
tion  theorem  for  singular  values. 

26.  G.  W.  Stewart  ’’Updating  A  Rank- Revealing  ULV  Decomposition”  UMIACS-TR- 
91-39,  CS-TR  2627,  March  1991 
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A  ULV  dccomposUion  of  a  matrix  A  of  order  7i  is  a  decomposition  of  the  form 
A  =  ULV^^,  wliere  U  and  V  are  orthogonal  matrices  and  Z  is  a  lower  triangular 
matrix.  When  A  is  approximately  of  rank  /c,  the  decomposition  is  rank  revealing 
if  the  last  n  —  k  rows  of  Z  are  small.  This  paper  presents  algorithms  for  updating 
a  rank-revealing  ULV  decomposition.  The  algorithms  run  in  O(n^)  time,  and  can 
be  implemented  on  a  linear  array  of  processors  to  run  in  0{n)  time 

27.  G.  Aaams,  M.  F.  Griffin,  and  G.  W.  Stewart  ”Direction-of-Arrival  Estimation  Us¬ 
ing  the  Rank- Revealing  URV  Decomposition”  UMIACS-TR  91-46,  CS-TR-2640, 
March  1991 

An  algorithm  for  updating  the  null  space  of  a  matri.x  is  described.  The  algorithm 
is  based  on  a  new  decomposition,  called  the  URV  decomposition,  which  can  be 
updated  in  O(N^)  and  servos  as  an  intermediary  between  the  QR  decomposition 
and  the  singular  value  decomposition.  The  URV  decomposition  is  applied  to  a 
high-resolution  direction  of  arrival  problem  based  on  the  MUSIC  algorithm.  A 
virtue  of  the  updating  algorithm  is  the  running  estimate  of  rank. 

28.  G.  W.  Stewart  "Error  Analysis  of  QR  Updating  with  Exponential  Windowing” 
UMIACS-TR-91-79,  CS-TR  2685,  May  1991 

Exponential  windowing  is  a  widely  used  technique  for  suppressing  the  effects 
of  old  data  as  new  data  is  added  to  a  matrix.  Specifically,  given  an  n  x  p 
matrix  A'n  and  a  “forgetting  factor”  /?  €  (0,1),  one  works  with  the  matrix 
diag(/?"“^,/3”'“^, . . . ,  1)A'„.  In  this  paper  we  examine  an  updating  algorithm  for 
computing  the  QR  factorization  of  diag(/8"~^,/?"~^, . . .,  1)A„  and  show  that  it  is 
unconditionally  stable  in  the  presence  of  rounding  errors. 

29.  Pill  Seong  Park,  “Iterative  Solution  of  Sparse  Singular  Systems  of  Equations  Aris¬ 
ing  from  Queuing  Networks,”  UMIACS-TR-91-83,  CS-TR-2690,  June  1991  • 
Iterative  methods  for  solving  large  sparse  singular  systems  Ax  =  0  arising  from 
queuing  networks  having  overflow  capacity  are  presented.  For  such  overflow  mod¬ 
els,  no  analytic  solution  exists  and  the  Kolmogorov  balance  equation  has  to  be 
solved  explicitly.  The  resulting  matrix  is  irreducible,  non-symmetric,  and  has  a 
one  dimensional  null  space.  However,  the  matrix  is  sparse,  highly  structured,  and 
has  property-A. 

We  transform  the  problem  into  an  eigenvalue  problem  Tx  =  x,  where  the  eigen¬ 
vector  corresponding  to  the  eigenvalue  1  is  the  desired  solution.  The  choice  of  the 
Jacobi  iteration  leads  to  a  2-cyclic  algorithm  which  reduces  the  necessary  amount 
of  work  by  1/2,  and  computation  of  the  residual  needs  no  extra  work. 

Inspired  by  the  similarity  between  Markov  models  of  queuing  networks  and  the 
grid  problems  arising  from  discretization  of  partial  differential  equations,  a  few 
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aggrcgalion/disaggrcgalioii(A/D)  type  methods  with  some  ideas  from  geometric 
imiltigrid  methods  are  considered. 

More  effective  methods  to  accelerate  convergence  of  the  2-cyclic  algorithm  are 
discussed.  The  methods  employ  orthogonal  projectors  onto  the  subspace  spanned 
by  dominant  eigencomponents  in  the  residual.  The  projection  step  is  further  re¬ 
fined  by  Arnoldi’s  method  without  extra  matrix-vector  multiplication  by  using 
the  results  of  power  iterations.  Adopting  the  Chebyshev  iterations  as  a  main 
driving  force  combined  with  power  iterations  and  projection  steps  refined  by 
Arnoldi’s  method,  the  resulting  hybrid  algorithm  outperforms  the  Chebyshev  iter¬ 
ation  methods  with  optimal  parameters.  We  study  the  convergence  of  the  hybrid 
algorithm  and  look  for  conditions  when  the  projection  step  can  accelerate  conver¬ 
gence  of  the  underlying  method.  Numerical  experiments  provide  further  evidence 
that  the  methods  can  be  quite  efficient,  especially  for  harder  problems. 

30.  G.  W.  Stewart  ’’Perturbation  Theory  for  Rectangular  Matrix  Pencils”  UMI.4CS- 
TR-91-105,  CS-TR  2721,  July  1991 

The  theory  of  eigenv'alues  and  eigenvectors  of  rectangular  matrix  pencils  is  com¬ 
plicated  by  the  fact  that  arbitrarily  small  perturbations  of  the  pencil  can  cause 
them  disappear.  However,  there  are  applications  in  which  the  properties  of  the 
pencil  ensure  the  existence  of  eigenvalues  and  eigenvectors.  In  this  paper  it  is 
shown  how  to  develop  a  perturbation  theory  for  such  pencils. 

31.  Per  Christian  Hansen  and  Dianne  Prost  O’Leary,  “The  use  of  the  L-curve  in  the 
regularization  of  discrete  ill-posed  problems,”  UMIACS-91-142,  CS-27S1,  October 
1991. 

In  order  to  produce  reasonable  solutions  to  ill-posed  problems,  regularization 
algorithms  are  often  used.  The  L-curve  is  a  plot — for  all  valid  regularization 
parameters — of  the  size  of  the  regularize  d  solution  versus  the  size  of  the  corre¬ 
sponding  residual.  We  establish  two  main  results.  First  we  gi%'e  a  unifying  charac 
terization  of  various  regularization  methods  and  show  that  the  measurement  of 
“size”  is  dependent  on  the  particular  regularization  method  chosen;  for  e.xample, 
the  2-norm  is  appropriate  for  Tikhonov  regularization,  but  a  1-norm  in  the  coor¬ 
dinate  system  of  the  singular  value  decomposition  (SVD)  is  relevant  to  truncated 
SVD  regularization  .  Secondly,  we  propose  a  new  method  for  choosing  the  reg¬ 
ularization  parameter  based  on  the  L-curve,  and  show  how  this  method  can  be 
implemented  efficiently.  We  compare  the  method  to  generalized  cross  \'alidation 
and  demonstrate  that  our  new  method  is  more  robust  in  the  presence  of  correlated 
errors. 

32.  Chiou-Ming  Huang  and  Dianne  P.  O’Leary,  A  Krylov  Multisplitting  .Algorithm  for 
Solving  Linear  Systems  of  Equations  Inst,  for  Mathematics  and  Its  Applications, 
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University  of  Miimosota  I’reprint  992,  July  1992. 

We  consider  tlic  practical  implementation  of  Krylov  snbspace  methods  (conjugate 
gradients,  GMRES,  etc.)  for  parallel  computers  in  the  case  where  the  precondi¬ 
tioning  matrix  arises  from  a  multisplitting.  We  show  that  the  algorithm  can  be 
efficiently  implemented  by  dividing  the  work  into  tasks  that  generate  search  direc¬ 
tions  and  a  single  task  that  minimizes  over  the  resulting  subspace.  Each  task  is 
assigned  to  a  subset  of  processors.  It  is  not  necessary  for  the  minimization  task  to 
send  frequent  information  to  the  direction  generating  tasks,  and  this  leads  to  high 
utilization  with  a  minimum  of  synchronization.  We  study  the  convergence  prop¬ 
erties  of  various  forms  of  the  algorithm  and  present  results  of  numerical  e.xamples 
on  a  sequential  computer.  We  consider  the  practical  implementation  of  Krylov 
subspace  methods  (conjugate  gradients,  GMRES,  etc.)  for  parallel  computers  in 
the  case  where  the  preconditioning  matrix  arises  from  a  multisplitting.  We  show 
that  the  algorithm  can  be  efficiently  implemented  by  dividing  the  work  into  tasks 
that  generate  search  directions  and  a  single  task  that  minimizes  over  the  resulting 
subspace.  Each  task  is  assigned  to  a  subset  of  processors.  It  is  not  necessary  for 
the  minimization  task  to  send  frequent  information  to  the  direction  generating 
tasks,  and  this  leads  to  high  utilization  with  a  minimum  of  synchronization.  We 
study  the  convergence  properties  of  various  forms  of  the  algorithm  and  present 
results  of  numerical  examples  on  a  sequential  computer. 

33.  J.  Barlow,  M.  Monahemi,  and  D.  P.  O’Leary,  Constrained  Matrix  Sylvester  Equa¬ 
tions  Computer  Science  Department  Report  CS-?,  Institute  for  Advanced  Com¬ 
puter  Studies  Report  UMIACS-91-?,  University  of  Maryland,  1991.  We  consider 
the  problem  of  finding  matrices  L  and  T  satisfying  T A  —  FT  =  LC  and  TB  —  Q. 
We  establish  existence  conditions  for  the  solution,  derive  an  algorithm  for  com¬ 
puting  the  solution,  and  discuss  conditions  under  which  the  matrix  [C^,T^]  is 
full  rank.  The  problem  arises  in  control  theory,  in  the  design  of  reduced  order 
observers  that  achieve  loop  transfer  recovery. 

34.  J.  Barlow,  M.  Monahemi,  and  D.  P.  O'Leary,  The  Design  of  Reduced-Order 
Observers  with  Precise  Loop  Transfer  Recovery,  Computer  Science  Department 
Report  CS-?,  Institute  for  Admnced  Computer  Studies  Report  UMI.\CS-92-?, 
University  of  Maryland,  1991  This  paper  concerns  the  design  of  reduced-order  ob¬ 
servers  for  systems  in  which  the  number  of  measurements  is  more  than  the  number 
of  controls.  We  develop  an  algorithm  that  applies  to  regular  systems  that  have  no 
transmission  zeroes.  The  algorithm  uses  eigenstructure  assignment  whereas  other 
approaclies  use  Kalman  filter  methods.  The  adv'antages  of  this  approach  are  the 
following:  i)  precise  loop  transfer  recovery  rather  than  approximate  loop  transfer 
recovery,  ii)  finite  observer  gain  rather  than  asymptotic  observer  gain,  iii)  modest 
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computational  tools  and  operations  counts.  Ca.se  studies  are  presented  illustrating 
these  features. 

35.  Dianne  P.  O’Leary,  Iterative  Methods  for  Finding  the  Stationary  Vector  for  Markov 
Chains,  Institute  for  Mathematics  and  Its  Applications,  University  of  Minnesota, 
Preprint  932,  February  1992 

This  overview  concerns  methods  for  estimating  the  steady-state  vector  of  ergodic 
Markov  chains.  The  problem  can  be  cast  as  an  ordinary  eigenvalue  problem,  but 
since  the  eigenvalue  is  known,  it  can  equally  well  be  studied  as  a  nullspace  prob¬ 
lem  or  as  a  linear  system.  We  discuss  iterative  methods  for  each  of  these  three 
formulations.  Many  of  the  applications,  such  as  queuing  modeling,  have  special 
structure  that  can  be  exploited  computationally,  and  we  give  special  emphasis  to 
three  ideas  for  exploiting  this  structure:  decomposibility,  separability,  and  multi¬ 
level  aggregation.  Such  ideas  result  in  a  large  number  of  diverse  algorithms,  many 
of  which  are  poorly  understood. 

36.  K.J.R.  Liu,  G.W.  Stewart,  and  Y.-J.  J.  Wu,  URV  ESPRIT  for  Tracking  Time- 
Varying  Signals  Computer  Science  Department  Report  CS-?,  Institute  for  Ad¬ 
vanced  Computer  Studies  Report  UMIACS-92-?,  University  of  Maryland,  October 
1992. 

ESPRIT  is  an  algorithm  for  determining  the  fixed  directions  of  arrival  of  a  set 
of  narrowband  signals  at  an  array  of  sensors.  Unfortunately,  its  computational 
burden  makes  it  unsuitable  for  real  time  processing  of  signals  with  time-\'arying 
directions  of  arrival.  In  this  work  we  develop  a  new  implementation  of  ESPRIT 
that  has  potential  for  real  time  processing.  It  is  based  on  a  rank-revealing  URV 
decomposition,  rather  than  the  eigendecomposition  or  singular  value  decompo¬ 
sition  used  in  previous  ESPRIT  algorithms.  We  demonstrate  its  performance 
on  simulated  data  representing  both  constant  and  time-^-arying  signals.  We  find 
that  the  URV-based  ESPRIT  algorithm  is  effective  for  estimating  time-varying 
directions-of-arrival  at  considerable  computational  savings  over  the  SVD-based 
algorithm. 

37.  G.  W.  Stewart,  On  the  Perturbation  of  Markov  Chains  with  Nearly  Transient 
States,  UMIACS-TR-92-14,  CS-TR-2835,  January  1992. 

Let  A  be  an  irreducible  stochastic  matrix  of  the  form 

A=f  j"  fO. 

y  i421  i422  / 

If  £22  'vere  zero,  the  states  corresponding  to  A22  would  be  transient  in  the  sense 
that  if  the  steady  state  vector  is  partitioned  conformally  in  the  form  (r/^  yj) 
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then  =  0.  If  F.22  's  small,  then  will  be  small,  and  the  states  are  said  to 
be  nearly  transient.  It  this  paper  it  is  shown  that  small  relative  perturbations  in 
^11,  -^21 5  and  .422,  though  potentially  larger  than  yj,  induce  only  small  relative 
perturbations  in  yj . 

38.  G.  W.  Stewart,  On  the  Perturbation  of  LU,  Cholesky,  and  QR  Factorizations, 
UMIACS-TR-92-24,  CS-TR-2848,  February  1992 

In  this  paper  error  bounds  are  derived  for  a  first  order  expansion  of  the  LU  fac¬ 
torization  of  a  perturbation  of  the  identity.  The  results  are  applied  to  obtain 
perturbation  expansions  of  the  LU,  Cholesky,  and  QR  factorizations. 

39.  G.  W.  Stewart,  Updating  URV  Decompositions  in  Parallel,  UMIACS-TR-92-44, 
CS-TR-28S0,  April  1992 

A  URV  decomposition  of  a  matrix  is  a  factorization  of  the  matrix  into  the  product 
of  a  unitary  matrix  (U),  an  upper  triangular  matrix  (R),  and  another  unitary 
matrix  (V).  In  an  earlier  paper  [UMIACS-TR-90-86]  it  w’as  shown  how  to  update 
a  URV  decomposition  in  such  a  way  that  it  reveals  the  effective  rank  of  the  matrix. 
It  was  also  argued  that  the  updating  procedure  could  be  implemented  in  parallel 
on  a  linear  array  of  processors;  however,  no  specific  algorithms  were  given.  This 
paper  gives  a  detailed  implementation  of  the  updating  procedure. 

40.  Z.  Bai  and  G.  W.  Stewart,  SRRIT  —  A  FORTRAN  Subroutine  to  Calculate  the 
Dominant  Invariant,  Subspace  of  a  Nonsymmetric  Matrix,  UMIACS  TR-92-61, 
CS  TR-2908,  May,  1992 

SRRIT  is  a  FORTRAN  program  to  calculate  an  approximate  orthonormal  basis  for 
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with  m  orthonormal  columns  and  real  quasi-triangular  matrix  T  of  order  m  sucli 
that  the  equation 

AQ  =  QT 

is  satisfied  up  to  a  tolerance  specified  by  the  user.  The  eigemalues  of  T  are 
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of  applications.  The  usual  aj>proacii  is  to  compute  a  rank-revealing  decompo¬ 
sition  and  make  a  decision  about  the  rank  by  examining  the  small  elements  of 
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decomposition. 
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